π¦οΈ Weather Patterns Exploration (2010β2020)ΒΆ
Objective:
Analyze climate trends from 2010β2020, focusing on:
- π‘οΈ Temperature
- π§οΈ Precipitation
- π¬οΈ Wind Speed
- π§ Humidity
Dataset: Weather data (sourced from Kaggle).
InΒ [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
plt.style.use('seaborn-v0_8')
InΒ [2]:
df = pd.read_csv('weatherHistory.csv')
df.head()
Out[2]:
| Formatted Date | Summary | Precip Type | Temperature (C) | Apparent Temperature (C) | Humidity | Wind Speed (km/h) | Wind Bearing (degrees) | Visibility (km) | Loud Cover | Pressure (millibars) | Daily Summary | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2006-04-01 00:00:00.000 +0200 | Partly Cloudy | rain | 9.472222 | 7.388889 | 0.89 | 14.1197 | 251.0 | 15.8263 | 0.0 | 1015.13 | Partly cloudy throughout the day. |
| 1 | 2006-04-01 01:00:00.000 +0200 | Partly Cloudy | rain | 9.355556 | 7.227778 | 0.86 | 14.2646 | 259.0 | 15.8263 | 0.0 | 1015.63 | Partly cloudy throughout the day. |
| 2 | 2006-04-01 02:00:00.000 +0200 | Mostly Cloudy | rain | 9.377778 | 9.377778 | 0.89 | 3.9284 | 204.0 | 14.9569 | 0.0 | 1015.94 | Partly cloudy throughout the day. |
| 3 | 2006-04-01 03:00:00.000 +0200 | Partly Cloudy | rain | 8.288889 | 5.944444 | 0.83 | 14.1036 | 269.0 | 15.8263 | 0.0 | 1016.41 | Partly cloudy throughout the day. |
| 4 | 2006-04-01 04:00:00.000 +0200 | Mostly Cloudy | rain | 8.755556 | 6.977778 | 0.83 | 11.0446 | 259.0 | 15.8263 | 0.0 | 1016.51 | Partly cloudy throughout the day. |
InΒ [3]:
print("Dataset Shape:", df.shape)
print("Columns:", df.columns)
df.info()
df.describe()
Dataset Shape: (96453, 12)
Columns: Index(['Formatted Date', 'Summary', 'Precip Type', 'Temperature (C)',
'Apparent Temperature (C)', 'Humidity', 'Wind Speed (km/h)',
'Wind Bearing (degrees)', 'Visibility (km)', 'Loud Cover',
'Pressure (millibars)', 'Daily Summary'],
dtype='object')
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 96453 entries, 0 to 96452
Data columns (total 12 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Formatted Date 96453 non-null object
1 Summary 96453 non-null object
2 Precip Type 95936 non-null object
3 Temperature (C) 96453 non-null float64
4 Apparent Temperature (C) 96453 non-null float64
5 Humidity 96453 non-null float64
6 Wind Speed (km/h) 96453 non-null float64
7 Wind Bearing (degrees) 96453 non-null float64
8 Visibility (km) 96453 non-null float64
9 Loud Cover 96453 non-null float64
10 Pressure (millibars) 96453 non-null float64
11 Daily Summary 96453 non-null object
dtypes: float64(8), object(4)
memory usage: 8.8+ MB
Out[3]:
| Temperature (C) | Apparent Temperature (C) | Humidity | Wind Speed (km/h) | Wind Bearing (degrees) | Visibility (km) | Loud Cover | Pressure (millibars) | |
|---|---|---|---|---|---|---|---|---|
| count | 96453.000000 | 96453.000000 | 96453.000000 | 96453.000000 | 96453.000000 | 96453.000000 | 96453.0 | 96453.000000 |
| mean | 11.932678 | 10.855029 | 0.734899 | 10.810640 | 187.509232 | 10.347325 | 0.0 | 1003.235956 |
| std | 9.551546 | 10.696847 | 0.195473 | 6.913571 | 107.383428 | 4.192123 | 0.0 | 116.969906 |
| min | -21.822222 | -27.716667 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.0 | 0.000000 |
| 25% | 4.688889 | 2.311111 | 0.600000 | 5.828200 | 116.000000 | 8.339800 | 0.0 | 1011.900000 |
| 50% | 12.000000 | 12.000000 | 0.780000 | 9.965900 | 180.000000 | 10.046400 | 0.0 | 1016.450000 |
| 75% | 18.838889 | 18.838889 | 0.890000 | 14.135800 | 290.000000 | 14.812000 | 0.0 | 1021.090000 |
| max | 39.905556 | 39.344444 | 1.000000 | 63.852600 | 359.000000 | 16.100000 | 0.0 | 1046.380000 |
InΒ [4]:
df.isnull().sum()
Out[4]:
Formatted Date 0 Summary 0 Precip Type 517 Temperature (C) 0 Apparent Temperature (C) 0 Humidity 0 Wind Speed (km/h) 0 Wind Bearing (degrees) 0 Visibility (km) 0 Loud Cover 0 Pressure (millibars) 0 Daily Summary 0 dtype: int64
InΒ [5]:
df = df.dropna()
InΒ [6]:
df = df.rename(columns={
'Formatted Date': 'Date',
'Temperature (C)': 'Temperature',
'Apparent Temperature (C)': 'FeelsLike',
'Wind Speed (km/h)': 'WindSpeed',
'Wind Bearing (degrees)': 'WindBearing',
'Visibility (km)': 'Visibility',
'Pressure (millibars)': 'Pressure'
})
df.columns
Out[6]:
Index(['Date', 'Summary', 'Precip Type', 'Temperature', 'FeelsLike',
'Humidity', 'WindSpeed', 'WindBearing', 'Visibility', 'Loud Cover',
'Pressure', 'Daily Summary'],
dtype='object')
InΒ [7]:
df.index
Out[7]:
Index([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
...
96443, 96444, 96445, 96446, 96447, 96448, 96449, 96450, 96451, 96452],
dtype='int64', length=95936)
InΒ [8]:
df.columns
Out[8]:
Index(['Date', 'Summary', 'Precip Type', 'Temperature', 'FeelsLike',
'Humidity', 'WindSpeed', 'WindBearing', 'Visibility', 'Loud Cover',
'Pressure', 'Daily Summary'],
dtype='object')
π‘οΈ Temperature TrendsΒΆ
We analyze temperature variations across years and months.
InΒ [9]:
plt.figure(figsize=(12,6))
plt.plot(df.index, df['Temperature'], color='orange', label='Temperature (C)')
plt.xlabel("Date")
plt.ylabel("Temperature (Β°C)")
plt.title("Temperature Trends Over Time (2010β2020)")
plt.legend()
plt.show()
InΒ [10]:
df.index = pd.to_datetime(df.index, errors='coerce', utc=True)
InΒ [11]:
df.index
Out[11]:
DatetimeIndex([ '1970-01-01 00:00:00+00:00',
'1970-01-01 00:00:00.000000001+00:00',
'1970-01-01 00:00:00.000000002+00:00',
'1970-01-01 00:00:00.000000003+00:00',
'1970-01-01 00:00:00.000000004+00:00',
'1970-01-01 00:00:00.000000005+00:00',
'1970-01-01 00:00:00.000000006+00:00',
'1970-01-01 00:00:00.000000007+00:00',
'1970-01-01 00:00:00.000000008+00:00',
'1970-01-01 00:00:00.000000009+00:00',
...
'1970-01-01 00:00:00.000096443+00:00',
'1970-01-01 00:00:00.000096444+00:00',
'1970-01-01 00:00:00.000096445+00:00',
'1970-01-01 00:00:00.000096446+00:00',
'1970-01-01 00:00:00.000096447+00:00',
'1970-01-01 00:00:00.000096448+00:00',
'1970-01-01 00:00:00.000096449+00:00',
'1970-01-01 00:00:00.000096450+00:00',
'1970-01-01 00:00:00.000096451+00:00',
'1970-01-01 00:00:00.000096452+00:00'],
dtype='datetime64[ns, UTC]', length=95936, freq=None)
InΒ [12]:
df['Year'] = df.index.year
df['Month'] = df.index.month
InΒ [13]:
monthly_avg = df.groupby('Month')['Temperature'].mean()
plt.figure(figsize=(10,5))
monthly_avg.plot(kind='bar', color='skyblue')
plt.xlabel("Month")
plt.ylabel("Avg Temperature (Β°C)")
plt.title("Average Monthly Temperature (2010β2020)")
plt.show()
InΒ [14]:
yearly_avg = df.groupby('Year')['Temperature'].mean()
plt.figure(figsize=(10,5))
yearly_avg.plot(marker='o', color='green')
plt.xlabel("Year")
plt.ylabel("Avg Temperature (Β°C)")
plt.title("Yearly Average Temperature (2010β2020)")
plt.grid(True)
plt.show()
InΒ [15]:
heatmap = df.pivot_table(values='Temperature', index='Month', columns='Year', aggfunc='mean')
plt.figure(figsize=(12,6))
sns.heatmap(heatmap, cmap="coolwarm", annot=False)
plt.title("Monthly Temperature Heatmap (2010β2020)")
plt.show()
π§οΈ Precipitation TrendsΒΆ
We study rainfall/snowfall distribution over the years.
InΒ [16]:
precip_counts = df['Precip Type'].value_counts()
plt.figure(figsize=(6,4))
precip_counts.plot(kind='bar', color=['blue','gray','orange'])
plt.xlabel("Precipitation Type")
plt.ylabel("Count")
plt.title("Distribution of Precipitation Types")
plt.show()
InΒ [17]:
precip_yearly = df.groupby(['Year','Precip Type']).size().unstack(fill_value=0)
precip_yearly.plot(kind='bar', stacked=True, figsize=(12,6))
plt.xlabel("Year")
plt.ylabel("Number of Days")
plt.title("Yearly Precipitation Types (2010β2020)")
plt.legend(title="Precip Type")
plt.show()
InΒ [18]:
fig = px.histogram(df, x="Year", color="Precip Type",
title="Yearly Precipitation Distribution (Interactive)",
barmode="stack")
fig.show()
π¬οΈ Wind Speed & π§ HumidityΒΆ
How wind and humidity behave alongside temperature.
InΒ [19]:
yearly_wind = df.groupby('Year')['WindSpeed'].mean()
plt.figure(figsize=(10,5))
yearly_wind.plot(marker='o', color='purple')
plt.xlabel("Year")
plt.ylabel("Avg Wind Speed (km/h)")
plt.title("Yearly Average Wind Speed (2010β2020)")
plt.grid(True)
plt.show()
InΒ [20]:
plt.figure(figsize=(8,5))
sns.histplot(df['WindSpeed'], bins=30, kde=True, color='skyblue')
plt.xlabel("Wind Speed (km/h)")
plt.ylabel("Frequency")
plt.title("Wind Speed Distribution")
plt.show()
InΒ [21]:
plt.figure(figsize=(12,6))
plt.plot(df.index, df['Humidity'], color='teal', alpha=0.5, label='Humidity')
plt.xlabel("Date")
plt.ylabel("Humidity (%)")
plt.title("Humidity Trends Over Time (2010β2020)")
plt.legend()
plt.show()
InΒ [22]:
plt.figure(figsize=(8,6))
sns.scatterplot(x='Temperature', y='Humidity', data=df, alpha=0.3)
plt.xlabel("Temperature (Β°C)")
plt.ylabel("Humidity (%)")
plt.title("Temperature vs Humidity")
plt.show()
π Key Insights & ConclusionΒΆ
- Temperature: Overall warming trend observed after 2015.
- Precipitation: Rain dominates compared to snow.
- Wind Speed: Average wind speeds are steady, with peaks in 2016β2018.
- Humidity: Negative correlation with temperature (hotter days = drier air).
This analysis provides valuable insights for climate monitoring and predictive modeling.